Accessing geospatial data the easy way (Python)

The access to geospatial data has changed significantly over the past decade. Data has traditionally been accessed by downloading several files to a local computer, then analyzing them with software or programming languages. It has always been difficult to access analysis-ready datasets due to the diversity of data formats (NetCDF, Grib2, Geotiff, Shapefile, etc.) and the variety of access protocols from different providers (Opendap, HTTPS, SFTP, WPS, API Rest, Datamarts, etc.). Beyond that, with the ever-increasing size of geospatial datasets, most modern datasets cannot even fit on a local computer, limiting science’s progress

The datasets presented here are large-scale analysis-ready cloud optimized (ARCO). In order to implement an entry point for a list of datasets, we have followed the methodology developed by the Pangeo community, which combines multiple technologies: - Data Lake (or S3, Azure Data Lake Storage, GCS, etc.) : distributed file-object storage - Zarr (or alternatively TileDB, COGs) : chunked N-dimensionnal array formats - Dask (or alternatively Spark, Ray, Distributed) : distributed computing and lazy loading - Intake Catalogs (or alternatively STAC) : a general interface for loading different data formats, mostly but not limited to spatiotemporal assets

For more information, please refer to the pangeo’s website

It is important to keep in mind that the majority of the datasets in the catalogue have language-agnostic formats, making them accessible through a variety of programming languages (including Python, Julia, Javascript, C, etc.) that implement the specifications for these formats (such as Zarr, netcdfs (kerchunk), geojson, etc.).

[1]:
from distributed import Client
import intake
import hvplot.xarray
import hvplot.pandas
from dask.distributed import PipInstall
import xoak
import xarray as xr
import numpy as np
import pandas as pd

Dask client

We use a Dask client to ensure all following code compatible with the framework run in parallel

[2]:
client = Client()
client
[2]:

Client

Client-fecdb5ed-66a3-11ed-89d7-000d3ae5305f

Connection method: Cluster object Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

Intake catalogs

Intake is a lightweight package for finding, investigating, loading and disseminating data. A cataloging system is used to organize a collection of datasets and data loaders (drivers) are parameterized such that datasets are opened in the desired format for the end user. In the python context, multi-dimensional xarrays could be opened with xarray’s drivers while polygons (shapefiles, geojson) could be opened with geopandas.

Here is the URL where you can access the catalog:

[3]:
catalog_url = 'https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml'
cat=intake.open_catalog(catalog_url)
cat
main:
  args:
    path: https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml
  description: Master Data Catalog
  driver: intake.catalog.local.YAMLFileCatalog
  metadata: {}

In order to arrange the collection of datasets, the catalogue itself makes references to various sub-catalogs:

[4]:
[cat[field]
 for field in list(cat._entries.keys())]
[4]:
[<Intake catalog: hydrology>,
 <Intake catalog: atmosphere>,
 <Intake catalog: geography>,
 <Intake catalog: climate_change>]

Even though our catalogue is constantly expanding, some datasets are already available. The next sections contain several examples of queries as well as analyses of various ones.

The current (flattened) catalogue is described in the table below. A dataset should be used after consulting the status field. If a dataset has a “dev” flag, it signifies that we are actively working on it and do not recommend using it. It is production-ready if it has a “prod” flag. The “prod” label signifies that the dataset has undergone quality review and testing, however users should always double-check on their own because errors are still possible.

[5]:
pd.set_option('display.max_colwidth', None)

pd.DataFrame([[field ,
               dataset,
               cat[field][dataset].describe()['description'],
               cat[field][dataset].describe()['metadata']['status'][0]]
              for field in list(cat._entries.keys())
              for dataset in cat[field]._entries.keys()],
            columns=['field', 'dataset_name', 'description', 'status']) \
.sort_values('field')
[5]:
field dataset_name description status
1 atmosphere era5_reanalysis_single_levels ERA5 hourly estimates of variables on single levels chunked for time series analysis prod
2 atmosphere era5_reanalysis_single_levels_spatial ERA5 hourly estimates of variables on single levels chunked for spatial analysis dev
3 atmosphere era5_land_reanalysis_spatial ERA5-Land hourly estimates on single level chunked for spatial analysis dev
4 atmosphere era5_reanalysis_pressure_levels ERA5 hourly estimates of variables on pressure levels prod
5 atmosphere daymet_daily_na Daymet Data Version 4.0 prod
6 atmosphere ghcnd_world Global Historical Climatology Network daily (GHCNd) dev
7 atmosphere scdna SCDNA a serially complete precipitation and temperature dataset for North America from 1979 to 2018 prod
8 atmosphere 20_century_reanalysis_single_levels NOAA-CIRES-DOE Twentieth Century Reanalysis (20CR) on single levels spanning 1836 to 2015 chunked for time series analysis prod
9 atmosphere 20_century_reanalysis_single_levels_large_area NOAA-CIRES-DOE Twentieth Century Reanalysis (20CR) on single levels spanning 1836 to 2015 chunked for spatial analysis prod
10 atmosphere 20_century_reanalysis_pressure_levels NOAA-CIRES-DOE Twentieth Century Reanalysis (20CR) on pressure levels spanning 1836 to 2015 chunked for time series analysis prod
11 atmosphere 20_century_reanalysis_pressure_levels_large_area NOAA-CIRES-DOE Twentieth Century Reanalysis (20CR) on pressure levels spanning 1836 to 2015 chunked for spatial analysis prod
12 atmosphere terraclimate TerraClimate is a dataset of monthly climate and climatic water balance for global terrestrial surfaces from 1958-2019 prod
14 climate_change rcp45_day_NAM_22i_raw_zarr NA-Cordex (limited to rcp45 for now... more to come!) dev
13 geography melcc_polygons MELCC basin delimitation dev
0 hydrology melcc CEHQ daily flow and water levels dev

1) Atmosphere datasets

a) ERA5 single levels

ERA5 is the fifth generation ECMWF atmospheric reanalysis of the global climate covering the period from January 1950 to present. ERA5 is produced by the Copernicus Climate Change Service (C3S) at ECMWF.

Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product.

Property

Values

Temporal extent:

01/01/1979 – 12/31/2020

Spatial extent:

World : [-180, 180, -90, 90]

Chunks (timeseries’s version):

{‘time’: 14880, ‘longitude’: 15, ‘latitude’: 15}

Chunks (spatial’s version):

{‘time’: 24, ‘longitude’: 1440, ‘latitude’: 721}

Spatial resolution:

0.25 degrees

Spatial reference:

WGS84 (EPSG:4326)

Temporal resolution:

1 hour

Update frequency:

In 2023, we will update it weekly

Data access

[6]:
ds=cat.atmosphere.era5_reanalysis_single_levels.to_dask()
ds
[6]:
<xarray.Dataset>
Dimensions:    (latitude: 721, longitude: 1440, time: 368184)
Coordinates:
  * latitude   (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
  * longitude  (longitude) float32 -180.0 -179.8 -179.5 ... 179.2 179.5 179.8
  * time       (time) datetime64[ns] 1979-01-01 ... 2020-12-31T23:00:00
Data variables:
    t2m        (time, latitude, longitude) float32 dask.array<chunksize=(14880, 15, 15), meta=np.ndarray>
    tp         (time, latitude, longitude) float32 dask.array<chunksize=(14880, 15, 15), meta=np.ndarray>
Attributes:
    institution:  ECMWF
    source:       Reanalysis
    title:        ERA5 forecasts

Working with the data

We can quickly choose data subsets in both space and time using xarray. Here, we choose July 19–20, 1996, a period when Quebec saw historically extreme precipitation (Canada). The graphic package hvplot can then be used to track the storm throughout the event.

[7]:
%%time

da = ds.tp \
.sel(time=slice('1996-07-19','1996-07-20'),
     longitude=slice(-90,-50),
     latitude=slice(60,35))

da \
.where(da>=0.001) \
.load() \
.hvplot(groupby='time',
        widget_type='scrubber',
        widget_location='bottom',
        cmap='gist_ncar',
        tiles='ESRI',
        geo=True,
        clim=(0.001, 0.005),
        width=750,
        height=400)
CPU times: user 4.79 s, sys: 354 ms, total: 5.15 s
Wall time: 11.2 s
[7]:

Because this zarr’s version of ERA5 is optimised for time series analysis, all historical data can be quickly extracted on a relatively small spatial extent (a point or a polygon for instance) as opposed to working with a collection of netcdf files which is typically extremely compute-intensive for large datasets due to the netcdfs being chunked in the time dimension.

[8]:
%%time
da = (1000*ds.tp) \
.sel(longitude=-75,
     latitude=45,
     method='nearest')

da.hvplot(grid=True, width=800, height=500, color='blue')
CPU times: user 231 ms, sys: 48.8 ms, total: 280 ms
Wall time: 1.75 s
[8]:
[9]:
%%time
da = (1000*ds.tp) \
.sel(longitude=-75,
     latitude=45,
     method='nearest') \
.resample(time='1Y') \
.sum()

da.hvplot.line(grid=True, width=800, height=500, color='blue')* \
da.hvplot.scatter(marker='o').opts(color='black', size=14)
CPU times: user 1.02 s, sys: 34.1 ms, total: 1.05 s
Wall time: 3.17 s
[9]:

b) ERA5 pressure levels

ERA5 is the fifth generation ECMWF atmospheric reanalysis of the global climate covering the period from January 1950 to present. ERA5 is produced by the Copernicus Climate Change Service (C3S) at ECMWF.

Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product.

Property

Values

Temporal extent:

01/01/1979 – 12/31/2019

Spatial extent:

Atlantic Northeast : [-96, -52, 40, 63]

Chunks:

{‘time’: 8760, ‘longitude’: 25, ‘latitude’: 25, ‘level’: 1}

Spatial resolution:

0.25 degrees

Spatial reference:

WGS84 (EPSG:4326)

Temporal resolution:

1 hour

Update frequency:

None

[10]:
ds=cat.atmosphere.era5_reanalysis_pressure_levels.to_dask()
ds
[10]:
<xarray.Dataset>
Dimensions:    (latitude: 93, level: 6, longitude: 177, time: 359400)
Coordinates:
  * latitude   (latitude) float32 63.0 62.75 62.5 62.25 ... 40.5 40.25 40.0
  * level      (level) int32 300 400 500 700 850 1000
  * longitude  (longitude) float32 -96.0 -95.75 -95.5 ... -52.5 -52.25 -52.0
  * time       (time) datetime64[ns] 1979-01-01 ... 2019-12-31T23:00:00
Data variables:
    r          (time, level, latitude, longitude) float32 dask.array<chunksize=(8760, 1, 25, 25), meta=np.ndarray>
    t          (time, level, latitude, longitude) float32 dask.array<chunksize=(8760, 1, 25, 25), meta=np.ndarray>
    u          (time, level, latitude, longitude) float32 dask.array<chunksize=(8760, 1, 25, 25), meta=np.ndarray>
    v          (time, level, latitude, longitude) float32 dask.array<chunksize=(8760, 1, 25, 25), meta=np.ndarray>
    z          (time, level, latitude, longitude) float32 dask.array<chunksize=(8760, 1, 25, 25), meta=np.ndarray>
Attributes:
    Conventions:  CF-1.6
    history:      2019-12-18 03:49:32 GMT by grib_to_netcdf-2.14.0: /opt/ecmw...

Working with the data

[11]:
%%time
ds.z \
.sel(longitude=-75, latitude=45, level=[500, 700, 850, 1000]).hvplot(grid=True, by='level')
CPU times: user 1.44 s, sys: 242 ms, total: 1.68 s
Wall time: 17.3 s
[11]:

c) ERA5-Land

ERA5-Land is the fifth generation ECMWF atmospheric reanalysis of the global climate covering the period from January 1950 to present. ERA5-Land is produced by the Copernicus Climate Change Service (C3S) at ECMWF.

Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product.

Property

Values

Temporal extent:

01/01/1950 – present

Spatial extent:

North America : [-167, -50, 15, 85]

Chunks (timeseries’s version):

{‘time’: 8760, ‘longitude’: 7, ‘latitude’: 7}

Chunks (spatial’s version):

{‘time’: 24, ‘longitude’: 1171, ‘latitude’: 701}

Spatial resolution:

0.1 degrees

Spatial reference:

WGS84 (EPSG:4326)

Temporal resolution:

1 hour

Update frequency:

In 2023, we will update it monthly

Available in December 2022. Please refer to previous ERA5 examples once the dataset is added to the catalog.

d) Daymet

The Daymet dataset contains daily minimum temperature, maximum temperature, precipitation, shortwave radiation, vapor pressure, snow water equivalent, and day length at 1km resolution for North America. Annual and monthly summaries are also available. The dataset covers the period from January 1, 1980 to December 31, 2020.

Daymet is accessible on Azure in Zarr format; this notebook shows how to access the data using the Planetary Computer’s resources so that it can be read into a xarray dataset.

Property

Values

Temporal extent:

01/01/1980 – 12/31/2020

Spatial extent:

North America

Chunks (timeseries’s version):

{‘time’: 365, ‘longitude’: 584, ‘latitude’: 284}

Spatial resolution:

1 km

Spatial reference:

Custom (‘+ellps=WGS84 +proj=lcc +lon_0=-100 +lat_0=42.5 +x_0=0.0 +y_0=0.0 +lat_1=25 +lat_2=60 +no_defs’)

Temporal resolution:

1 day

Update frequency:

None

Data access

[12]:
ds=cat.atmosphere.daymet_daily_na.to_dask()
ds
[12]:
<xarray.Dataset>
Dimensions:                  (time: 14965, y: 8075, x: 7814, nv: 2)
Coordinates:
    lat                      (y, x) float32 dask.array<chunksize=(284, 584), meta=np.ndarray>
    lon                      (y, x) float32 dask.array<chunksize=(284, 584), meta=np.ndarray>
  * time                     (time) datetime64[ns] 1980-01-01T12:00:00 ... 20...
  * x                        (x) float32 -4.56e+06 -4.559e+06 ... 3.253e+06
  * y                        (y) float32 4.984e+06 4.983e+06 ... -3.09e+06
Dimensions without coordinates: nv
Data variables:
    dayl                     (time, y, x) float32 dask.array<chunksize=(365, 284, 584), meta=np.ndarray>
    lambert_conformal_conic  int16 ...
    prcp                     (time, y, x) float32 dask.array<chunksize=(365, 284, 584), meta=np.ndarray>
    srad                     (time, y, x) float32 dask.array<chunksize=(365, 284, 584), meta=np.ndarray>
    swe                      (time, y, x) float32 dask.array<chunksize=(365, 284, 584), meta=np.ndarray>
    time_bnds                (time, nv) datetime64[ns] dask.array<chunksize=(365, 2), meta=np.ndarray>
    tmax                     (time, y, x) float32 dask.array<chunksize=(365, 284, 584), meta=np.ndarray>
    tmin                     (time, y, x) float32 dask.array<chunksize=(365, 284, 584), meta=np.ndarray>
    vp                       (time, y, x) float32 dask.array<chunksize=(365, 284, 584), meta=np.ndarray>
    yearday                  (time) int16 dask.array<chunksize=(365,), meta=np.ndarray>
Attributes:
    Conventions:       CF-1.6
    Version_data:      Daymet Data Version 4.0
    Version_software:  Daymet Software Version 4.0
    citation:          Please see http://daymet.ornl.gov/ for current Daymet ...
    references:        Please see http://daymet.ornl.gov/ for current informa...
    source:            Daymet Software Version 4.0
    start_year:        1980

Working with the data

Because Daymet has a custom projection, we use xoak library to query some data. It is also possible to regrid or reproject the data to facilitate analysis.

[13]:
%%time
ds = ds.sel(time=slice('2000-01-01','2001-01-01'))
points = xr.Dataset(
    {
        "lat": 45,
        "lon": -75,
    }
)

da_tmax = ds.tmax
da_tmax.xoak.set_index(["lat", "lon"], "sklearn_geo_balltree")

da_tmin = ds.tmin
da_tmin.xoak.set_index(["lat", "lon"], "sklearn_geo_balltree")

prcp = ds.prcp
prcp.xoak.set_index(["lat", "lon"], "sklearn_geo_balltree")

swe = ds.swe
swe.xoak.set_index(["lat", "lon"], "sklearn_geo_balltree")

(da_tmax.xoak.sel(lat=points.lat,
                  lon=points.lon).hvplot(grid=True,
                                         value_label='daily temperature (degrees C)')* \
da_tmin.xoak.sel(lat=points.lat,
                 lon=points.lon).hvplot(grid=True) + \
prcp.xoak.sel(lat=points.lat,
              lon=points.lon).hvplot(grid=True) + \
swe.xoak.sel(lat=points.lat,
             lon=points.lon).hvplot(grid=True)
).cols(1)

2022-11-17 18:21:10,015 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-1692116a-9241-400a-b9c5-44f3a5e9c733
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  15.250029, -141.61205 ],
       [  15.254093, -141.60435 ],
       [  15.258157, -141.59666 ],
       ...,
       [  19.86918 ,  -69.16344 ],
       [  19.86601 ,  -69.15479 ],
       [  19.86284 ,  -69.14615 ]], dtype=float32), 53260224)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,023 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-e30935e5-cf8e-47d1-856f-9ea6c9a83f94
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  15.250029, -141.61205 ],
       [  15.254093, -141.60435 ],
       [  15.258157, -141.59666 ],
       ...,
       [  19.86918 ,  -69.16344 ],
       [  19.86601 ,  -69.15479 ],
       [  19.86284 ,  -69.14615 ]], dtype=float32), 53260224)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,031 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-4f9fee3c-b919-4256-90a6-d104916b2c48
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  15.250029, -141.61205 ],
       [  15.254093, -141.60435 ],
       [  15.258157, -141.59666 ],
       ...,
       [  19.86918 ,  -69.16344 ],
       [  19.86601 ,  -69.15479 ],
       [  19.86284 ,  -69.14615 ]], dtype=float32), 53260224)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,036 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-93b20a32-2e9c-4c45-8f9a-8893f016eeaf
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  15.250029, -141.61205 ],
       [  15.254093, -141.60435 ],
       [  15.258157, -141.59666 ],
       ...,
       [  19.86918 ,  -69.16344 ],
       [  19.86601 ,  -69.15479 ],
       [  19.86284 ,  -69.14615 ]], dtype=float32), 53260224)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,041 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-b2182707-4cc9-4bce-a79e-6fc1559d7e5d
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  15.100861, -141.52846 ],
       [  15.104913, -141.52078 ],
       [  15.108965, -141.5131  ],
       ...,
       [  19.705662,  -69.2303  ],
       [  19.702501,  -69.22167 ],
       [  19.69934 ,  -69.21304 ]], dtype=float32), 53416504)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,047 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-9b6a6d39-b04c-455d-b3ff-c809e10f6f30
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  15.100861, -141.52846 ],
       [  15.104913, -141.52078 ],
       [  15.108965, -141.5131  ],
       ...,
       [  19.705662,  -69.2303  ],
       [  19.702501,  -69.22167 ],
       [  19.69934 ,  -69.21304 ]], dtype=float32), 53416504)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,164 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-75739519-0f14-4242-9419-30de2c56dd53
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  15.100861, -141.52846 ],
       [  15.104913, -141.52078 ],
       [  15.108965, -141.5131  ],
       ...,
       [  19.705662,  -69.2303  ],
       [  19.702501,  -69.22167 ],
       [  19.69934 ,  -69.21304 ]], dtype=float32), 53416504)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,169 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-bfa9f207-c6c9-4671-9331-851af02966ca
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  15.100861, -141.52846 ],
       [  15.104913, -141.52078 ],
       [  15.108965, -141.5131  ],
       ...,
       [  19.705662,  -69.2303  ],
       [  19.702501,  -69.22167 ],
       [  19.69934 ,  -69.21304 ]], dtype=float32), 53416504)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,597 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-629e996a-12e1-477d-8b59-63b8a9192d3c
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.95178 , -141.44519 ],
       [  14.955821, -141.43752 ],
       [  14.959861, -141.42986 ],
       ...,
       [  19.542261,  -69.29688 ],
       [  19.539112,  -69.28827 ],
       [  19.535961,  -69.279655]], dtype=float32), 53572784)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,605 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-22628e4e-2f11-4fdb-9f41-7b717004f5ce
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.95178 , -141.44519 ],
       [  14.955821, -141.43752 ],
       [  14.959861, -141.42986 ],
       ...,
       [  19.542261,  -69.29688 ],
       [  19.539112,  -69.28827 ],
       [  19.535961,  -69.279655]], dtype=float32), 53572784)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,619 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-23c4f98c-5f9e-404a-92a8-ed1a2c77dd68
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.95178 , -141.44519 ],
       [  14.955821, -141.43752 ],
       [  14.959861, -141.42986 ],
       ...,
       [  19.542261,  -69.29688 ],
       [  19.539112,  -69.28827 ],
       [  19.535961,  -69.279655]], dtype=float32), 53572784)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,624 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-5f63f7ae-2464-497e-96dc-07f5cda651eb
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.95178 , -141.44519 ],
       [  14.955821, -141.43752 ],
       [  14.959861, -141.42986 ],
       ...,
       [  19.542261,  -69.29688 ],
       [  19.539112,  -69.28827 ],
       [  19.535961,  -69.279655]], dtype=float32), 53572784)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,631 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-6b56bee5-9fad-4f40-b015-07c45533e7d7
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.802789, -141.36221 ],
       [  14.806817, -141.35455 ],
       [  14.810844, -141.34691 ],
       ...,
       [  19.378983,  -69.36319 ],
       [  19.375841,  -69.35459 ],
       [  19.3727  ,  -69.34599 ]], dtype=float32), 53729064)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,637 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-b40f203f-423d-43e6-bf03-884a331c0e8c
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.802789, -141.36221 ],
       [  14.806817, -141.35455 ],
       [  14.810844, -141.34691 ],
       ...,
       [  19.378983,  -69.36319 ],
       [  19.375841,  -69.35459 ],
       [  19.3727  ,  -69.34599 ]], dtype=float32), 53729064)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,643 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-42e0c7db-a239-469b-95b1-2a3689ce740e
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.802789, -141.36221 ],
       [  14.806817, -141.35455 ],
       [  14.810844, -141.34691 ],
       ...,
       [  19.378983,  -69.36319 ],
       [  19.375841,  -69.35459 ],
       [  19.3727  ,  -69.34599 ]], dtype=float32), 53729064)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,648 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-9394e2b3-d9ff-45b0-b1c8-ae5d46eefc37
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.802789, -141.36221 ],
       [  14.806817, -141.35455 ],
       [  14.810844, -141.34691 ],
       ...,
       [  19.378983,  -69.36319 ],
       [  19.375841,  -69.35459 ],
       [  19.3727  ,  -69.34599 ]], dtype=float32), 53729064)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,654 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-8dca9341-88fb-4db9-ab8c-da0c5053eeef
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.653887, -141.27956 ],
       [  14.657903, -141.27191 ],
       [  14.661919, -141.26425 ],
       ...,
       [  19.215822,  -69.42923 ],
       [  19.212692,  -69.42065 ],
       [  19.20956 ,  -69.41206 ]], dtype=float32), 53885344)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,671 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-7fbb48f3-d75b-49a2-b0a8-83ffc70169c1
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.653887, -141.27956 ],
       [  14.657903, -141.27191 ],
       [  14.661919, -141.26425 ],
       ...,
       [  19.215822,  -69.42923 ],
       [  19.212692,  -69.42065 ],
       [  19.20956 ,  -69.41206 ]], dtype=float32), 53885344)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,681 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-1e01adb1-edf7-4285-9dbc-12bc5f48797c
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.653887, -141.27956 ],
       [  14.657903, -141.27191 ],
       [  14.661919, -141.26425 ],
       ...,
       [  19.215822,  -69.42923 ],
       [  19.212692,  -69.42065 ],
       [  19.20956 ,  -69.41206 ]], dtype=float32), 53885344)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,797 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-3a6fc925-5f6c-4341-9e44-6858d89d19a2
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.653887, -141.27956 ],
       [  14.657903, -141.27191 ],
       [  14.661919, -141.26425 ],
       ...,
       [  19.215822,  -69.42923 ],
       [  19.212692,  -69.42065 ],
       [  19.20956 ,  -69.41206 ]], dtype=float32), 53885344)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,802 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-1816942a-e436-4019-ab12-1fcc7c138a38
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.505075, -141.19719 ],
       [  14.50908 , -141.18954 ],
       [  14.513083, -141.18192 ],
       ...,
       [  19.052786,  -69.494995],
       [  19.049665,  -69.48643 ],
       [  19.046545,  -69.47785 ]], dtype=float32), 54041624)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,808 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-1a8769f4-16ac-45c2-8952-5ccd7bd40500
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.505075, -141.19719 ],
       [  14.50908 , -141.18954 ],
       [  14.513083, -141.18192 ],
       ...,
       [  19.052786,  -69.494995],
       [  19.049665,  -69.48643 ],
       [  19.046545,  -69.47785 ]], dtype=float32), 54041624)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,818 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-1c257d9c-e917-4237-9063-0e50395fc72a
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.505075, -141.19719 ],
       [  14.50908 , -141.18954 ],
       [  14.513083, -141.18192 ],
       ...,
       [  19.052786,  -69.494995],
       [  19.049665,  -69.48643 ],
       [  19.046545,  -69.47785 ]], dtype=float32), 54041624)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,826 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-0e94aa20-a948-40af-910c-b3585441b278
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.505075, -141.19719 ],
       [  14.50908 , -141.18954 ],
       [  14.513083, -141.18192 ],
       ...,
       [  19.052786,  -69.494995],
       [  19.049665,  -69.48643 ],
       [  19.046545,  -69.47785 ]], dtype=float32), 54041624)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,835 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-553b53fe-b0e7-4cde-b328-33c072b2f3b9
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.356357, -141.11513 ],
       [  14.360349, -141.1075  ],
       [  14.364341, -141.09987 ],
       ...,
       [  18.889874,  -69.560486],
       [  18.886763,  -69.55193 ],
       [  18.883652,  -69.54338 ]], dtype=float32), 54197904)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,844 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-1297a9a9-a74d-4382-a9a4-771088f791b5
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.356357, -141.11513 ],
       [  14.360349, -141.1075  ],
       [  14.364341, -141.09987 ],
       ...,
       [  18.889874,  -69.560486],
       [  18.886763,  -69.55193 ],
       [  18.883652,  -69.54338 ]], dtype=float32), 54197904)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,851 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-3fc434f7-d759-41fb-89dd-155e5a32df77
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.356357, -141.11513 ],
       [  14.360349, -141.1075  ],
       [  14.364341, -141.09987 ],
       ...,
       [  18.889874,  -69.560486],
       [  18.886763,  -69.55193 ],
       [  18.883652,  -69.54338 ]], dtype=float32), 54197904)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,856 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-553b851f-0c1a-4308-ae26-e92a8c44355e
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.356357, -141.11513 ],
       [  14.360349, -141.1075  ],
       [  14.364341, -141.09987 ],
       ...,
       [  18.889874,  -69.560486],
       [  18.886763,  -69.55193 ],
       [  18.883652,  -69.54338 ]], dtype=float32), 54197904)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,863 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-ee207173-1796-4e8c-ba61-4deb816b2923
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.207731, -141.03337 ],
       [  14.211712, -141.02574 ],
       [  14.215692, -141.01813 ],
       ...,
       [  18.727087,  -69.62572 ],
       [  18.723986,  -69.61718 ],
       [  18.720882,  -69.608635]], dtype=float32), 54354184)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,869 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-75e2b05c-8aa3-4327-8d21-430fc8fe938d
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.207731, -141.03337 ],
       [  14.211712, -141.02574 ],
       [  14.215692, -141.01813 ],
       ...,
       [  18.727087,  -69.62572 ],
       [  18.723986,  -69.61718 ],
       [  18.720882,  -69.608635]], dtype=float32), 54354184)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,972 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-eb43fb5a-d296-431e-b99f-1dcdceaba03c
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.207731, -141.03337 ],
       [  14.211712, -141.02574 ],
       [  14.215692, -141.01813 ],
       ...,
       [  18.727087,  -69.62572 ],
       [  18.723986,  -69.61718 ],
       [  18.720882,  -69.608635]], dtype=float32), 54354184)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:10,979 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-9994b0f0-f5c5-4a6b-9b76-75c68f07e206
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.207731, -141.03337 ],
       [  14.211712, -141.02574 ],
       [  14.215692, -141.01813 ],
       ...,
       [  18.727087,  -69.62572 ],
       [  18.723986,  -69.61718 ],
       [  18.720882,  -69.608635]], dtype=float32), 54354184)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,001 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-7e9a7696-a304-4ba4-9a17-d4ce3ba07d23
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.0592  , -140.9519  ],
       [  14.063169, -140.94429 ],
       [  14.067137, -140.93669 ],
       ...,
       [  18.564425,  -69.690674],
       [  18.561333,  -69.68215 ],
       [  18.55824 ,  -69.67363 ]], dtype=float32), 54510464)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,007 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-1da4791f-f33e-4766-938d-d7657fa27f61
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.0592  , -140.9519  ],
       [  14.063169, -140.94429 ],
       [  14.067137, -140.93669 ],
       ...,
       [  18.564425,  -69.690674],
       [  18.561333,  -69.68215 ],
       [  18.55824 ,  -69.67363 ]], dtype=float32), 54510464)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,012 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-c0710b94-f9cd-4a8f-9045-4399b69afeba
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.0592  , -140.9519  ],
       [  14.063169, -140.94429 ],
       [  14.067137, -140.93669 ],
       ...,
       [  18.564425,  -69.690674],
       [  18.561333,  -69.68215 ],
       [  18.55824 ,  -69.67363 ]], dtype=float32), 54510464)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,018 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-25fa74c6-ca8f-4e8c-8d1d-423f6bc71a2b
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  14.0592  , -140.9519  ],
       [  14.063169, -140.94429 ],
       [  14.067137, -140.93669 ],
       ...,
       [  18.564425,  -69.690674],
       [  18.561333,  -69.68215 ],
       [  18.55824 ,  -69.67363 ]], dtype=float32), 54510464)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,023 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-e885c479-21ba-455f-a8df-9fb4dc79d48a
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  13.910765 , -140.87073  ],
       [  13.9147215, -140.86314  ],
       [  13.918678 , -140.85555  ],
       ...,
       [  18.401892 ,  -69.75537  ],
       [  18.39881  ,  -69.746864 ],
       [  18.395725 ,  -69.73835  ]], dtype=float32), 54666744)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,030 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-84e3589e-87ed-4293-868c-657250341ad0
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  13.910765 , -140.87073  ],
       [  13.9147215, -140.86314  ],
       [  13.918678 , -140.85555  ],
       ...,
       [  18.401892 ,  -69.75537  ],
       [  18.39881  ,  -69.746864 ],
       [  18.395725 ,  -69.73835  ]], dtype=float32), 54666744)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,041 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-acb4cc11-e4d2-41fd-8df4-c962f0790376
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  13.910765 , -140.87073  ],
       [  13.9147215, -140.86314  ],
       [  13.918678 , -140.85555  ],
       ...,
       [  18.401892 ,  -69.75537  ],
       [  18.39881  ,  -69.746864 ],
       [  18.395725 ,  -69.73835  ]], dtype=float32), 54666744)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,049 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-c180325c-04b8-4cec-83ba-4eccbee0b8b4
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  13.910765 , -140.87073  ],
       [  13.9147215, -140.86314  ],
       [  13.918678 , -140.85555  ],
       ...,
       [  18.401892 ,  -69.75537  ],
       [  18.39881  ,  -69.746864 ],
       [  18.395725 ,  -69.73835  ]], dtype=float32), 54666744)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,055 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-e6181811-575a-42fe-90eb-e83ea7eeb999
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  13.762426, -140.78986 ],
       [  13.766372, -140.78227 ],
       [  13.770316, -140.77469 ],
       ...,
       [  18.239485,  -69.8198  ],
       [  18.236414,  -69.81131 ],
       [  18.23334 ,  -69.80282 ]], dtype=float32), 54823024)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,066 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-7839c521-2faf-4270-b250-934a4f511aac
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  13.762426, -140.78986 ],
       [  13.766372, -140.78227 ],
       [  13.770316, -140.77469 ],
       ...,
       [  18.239485,  -69.8198  ],
       [  18.236414,  -69.81131 ],
       [  18.23334 ,  -69.80282 ]], dtype=float32), 54823024)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,074 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-bb22da01-7f95-48cb-bdea-f34f780e2f3f
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  13.762426, -140.78986 ],
       [  13.766372, -140.78227 ],
       [  13.770316, -140.77469 ],
       ...,
       [  18.239485,  -69.8198  ],
       [  18.236414,  -69.81131 ],
       [  18.23334 ,  -69.80282 ]], dtype=float32), 54823024)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,086 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-2251f6a5-3ce5-446d-b2bd-c58a6043b176
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  13.762426, -140.78986 ],
       [  13.766372, -140.78227 ],
       [  13.770316, -140.77469 ],
       ...,
       [  18.239485,  -69.8198  ],
       [  18.236414,  -69.81131 ],
       [  18.23334 ,  -69.80282 ]], dtype=float32), 54823024)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,093 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-c9f4164c-9a5b-458d-8abd-d2aa28b6a65f
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  13.614186, -140.70929 ],
       [  13.61812 , -140.7017  ],
       [  13.622052, -140.69412 ],
       ...,
       [  18.07721 ,  -69.88397 ],
       [  18.074148,  -69.875496],
       [  18.071083,  -69.86701 ]], dtype=float32), 54979304)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,098 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-74b75c69-1d34-49f8-accf-9e1869adb755
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  13.614186, -140.70929 ],
       [  13.61812 , -140.7017  ],
       [  13.622052, -140.69412 ],
       ...,
       [  18.07721 ,  -69.88397 ],
       [  18.074148,  -69.875496],
       [  18.071083,  -69.86701 ]], dtype=float32), 54979304)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,104 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-1b48c661-856a-4f4d-9302-1228cba09136
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  13.614186, -140.70929 ],
       [  13.61812 , -140.7017  ],
       [  13.622052, -140.69412 ],
       ...,
       [  18.07721 ,  -69.88397 ],
       [  18.074148,  -69.875496],
       [  18.071083,  -69.86701 ]], dtype=float32), 54979304)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,111 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-14085370-9514-45af-84ef-8bffea1ed152
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  13.614186, -140.70929 ],
       [  13.61812 , -140.7017  ],
       [  13.622052, -140.69412 ],
       ...,
       [  18.07721 ,  -69.88397 ],
       [  18.074148,  -69.875496],
       [  18.071083,  -69.86701 ]], dtype=float32), 54979304)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File <timed exec>:21

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/xoak/accessor.py:257, in XoakAccessor.sel(self, indexers, **indexers_kwargs)
    253 indices = self._query(indexers)
    255 if not isinstance(indices, np.ndarray):
    256     # TODO: remove (see todo below)
--> 257     indices = indices.compute()
    259 pos_indexers = self._get_pos_indexers(indices, indexers)
    261 # TODO: issue in xarray. 1-dimensional xarray.Variables are always considered
    262 # as OuterIndexer, while we want here VectorizedIndexer
    263 # This would also allow lazy selection

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/dask/base.py:315, in DaskMethodsMixin.compute(self, **kwargs)
    291 def compute(self, **kwargs):
    292     """Compute this dask collection
    293
    294     This turns a lazy Dask collection into its in-memory equivalent.
   (...)
    313     dask.base.compute
    314     """
--> 315     (result,) = compute(self, traverse=False, **kwargs)
    316     return result

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/dask/base.py:600, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    597     keys.append(x.__dask_keys__())
    598     postcomputes.append(x.__dask_postcompute__())
--> 600 results = schedule(dsk, keys, **kwargs)
    601 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/distributed/client.py:3122, in Client.get(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
   3120         should_rejoin = False
   3121 try:
-> 3122     results = self.gather(packed, asynchronous=asynchronous, direct=direct)
   3123 finally:
   3124     for f in futures.values():

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/distributed/client.py:2291, in Client.gather(self, futures, errors, direct, asynchronous)
   2289 else:
   2290     local_worker = None
-> 2291 return self.sync(
   2292     self._gather,
   2293     futures,
   2294     errors=errors,
   2295     direct=direct,
   2296     local_worker=local_worker,
   2297     asynchronous=asynchronous,
   2298 )

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/distributed/utils.py:339, in SyncMethodMixin.sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    337     return future
    338 else:
--> 339     return sync(
    340         self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    341     )

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/distributed/utils.py:406, in sync(loop, func, callback_timeout, *args, **kwargs)
    404 if error:
    405     typ, exc, tb = error
--> 406     raise exc.with_traceback(tb)
    407 else:
    408     return result

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/distributed/utils.py:379, in sync.<locals>.f()
    377         future = asyncio.wait_for(future, callback_timeout)
    378     future = asyncio.ensure_future(future)
--> 379     result = yield future
    380 except Exception:
    381     error = sys.exc_info()

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/tornado/gen.py:762, in Runner.run(self)
    759 exc_info = None
    761 try:
--> 762     value = future.result()
    763 except Exception:
    764     exc_info = sys.exc_info()

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/distributed/client.py:2154, in Client._gather(self, futures, errors, direct, local_worker)
   2152         exc = CancelledError(key)
   2153     else:
-> 2154         raise exception.with_traceback(traceback)
   2155     raise exc
   2156 if errors == "skip":

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/xoak/index/base.py:227, in __init__()
    224 index_adapter_cls = normalize_index(index_adapter)
    226 self._index_adapter = index_adapter_cls(**kwargs)
--> 227 self._index = self._index_adapter.build(points)
    228 self._offset = offset

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/xoak/index/sklearn_adapters.py:55, in build()
     54 def build(self, points):
---> 55     return BallTree(np.deg2rad(points), **self._index_options)

File sklearn/neighbors/_binary_tree.pxi:833, in sklearn.neighbors._ball_tree.BinaryTree.__init__()

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/sklearn/utils/validation.py:899, in check_array()
    893         raise ValueError(
    894             "Found array with dim %d. %s expected <= 2."
    895             % (array.ndim, estimator_name)
    896         )
    898     if force_all_finite:
--> 899         _assert_all_finite(
    900             array,
    901             input_name=input_name,
    902             estimator_name=estimator_name,
    903             allow_nan=force_all_finite == "allow-nan",
    904         )
    906 if ensure_min_samples > 0:
    907     n_samples = _num_samples(array)

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/sklearn/utils/validation.py:146, in _assert_all_finite()
    124         if (
    125             not allow_nan
    126             and estimator_name
   (...)
    130             # Improve the error message on how to handle missing values in
    131             # scikit-learn.
    132             msg_err += (
    133                 f"\n{estimator_name} does not accept missing values"
    134                 " encoded as NaN natively. For supervised learning, you might want"
   (...)
    144                 "#estimators-that-handle-nan-values"
    145             )
--> 146         raise ValueError(msg_err)
    148 # for object dtype data, we only check for NaNs (GH-13254)
    149 elif X.dtype == np.dtype("object") and not allow_nan:

ValueError: Input contains NaN.

e) SCDNA (extent: North America)

Station-based serially complete datasets (SCDs) of precipitation and temperature observations are important for hydrometeorological studies. Motivated by the lack of serially complete station observations for North America, this study seeks to develop an SCD from 1979 to 2018 from station data. The new SCD for North America (SCDNA) includes daily precipitation, minimum temperature (Tmin), and maximum temperature (Tmax) data for 27 276 stations. Raw meteorological station data were obtained from the Global Historical Climate Network Daily (GHCN-D), the Global Surface Summary of the Day (GSOD), Environment and Climate Change Canada (ECCC), and a compiled station database in Mexico. Stations with at least 8-year-long records were selected, which underwent location correction and were subjected to strict quality control. Outputs from three reanalysis products (ERA5, JRA-55, and MERRA-2) provided auxiliary information to estimate station records

Property

Values

Temporal extent

01/01/1979 – 12/31/2018

Spatial extent

North America : [-177, -52, 7, 83]

Chunks

{‘time’: 1000, ‘ID’: 1000}

[14]:
ds=cat.atmosphere.scdna.to_dask()
ds
2022-11-17 18:21:11,489 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-77c48177-b59b-43ab-af6e-1edfc8b2e503
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  13.466046, -140.629   ],
       [  13.469968, -140.62143 ],
       [  13.473888, -140.61386 ],
       ...,
       [  17.915066,  -69.94788 ],
       [  17.912012,  -69.939415],
       [  17.908958,  -69.930954]], dtype=float32), 55135584)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,500 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-8a4d0e87-dedb-4245-b278-f39fa1226daf
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  13.466046, -140.629   ],
       [  13.469968, -140.62143 ],
       [  13.473888, -140.61386 ],
       ...,
       [  17.915066,  -69.94788 ],
       [  17.912012,  -69.939415],
       [  17.908958,  -69.930954]], dtype=float32), 55135584)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,507 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-7b6b466b-ecdc-4238-91e5-3c0fd0b638e2
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  13.466046, -140.629   ],
       [  13.469968, -140.62143 ],
       [  13.473888, -140.61386 ],
       ...,
       [  17.915066,  -69.94788 ],
       [  17.912012,  -69.939415],
       [  17.908958,  -69.930954]], dtype=float32), 55135584)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,514 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-47acbc58-d385-4e16-9740-e1d5f8cf53cb
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  13.466046, -140.629   ],
       [  13.469968, -140.62143 ],
       [  13.473888, -140.61386 ],
       ...,
       [  17.915066,  -69.94788 ],
       [  17.912012,  -69.939415],
       [  17.908958,  -69.930954]], dtype=float32), 55135584)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,522 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-e80ba059-e21c-4f6d-b5e1-0e4f321d5225
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  13.318006, -140.549   ],
       [  13.321916, -140.54144 ],
       [  13.325825, -140.53389 ],
       ...,
       [  17.720667,  -70.02423 ],
       [  17.717627,  -70.015785],
       [  17.714582,  -70.00734 ]], dtype=float32), 55291864)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,531 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-6be540ba-03a0-4491-844f-422fd8629647
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  13.318006, -140.549   ],
       [  13.321916, -140.54144 ],
       [  13.325825, -140.53389 ],
       ...,
       [  17.720667,  -70.02423 ],
       [  17.717627,  -70.015785],
       [  17.714582,  -70.00734 ]], dtype=float32), 55291864)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,540 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-01c729f9-0f2c-43e8-90d1-64bae63f9931
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  13.318006, -140.549   ],
       [  13.321916, -140.54144 ],
       [  13.325825, -140.53389 ],
       ...,
       [  17.720667,  -70.02423 ],
       [  17.717627,  -70.015785],
       [  17.714582,  -70.00734 ]], dtype=float32), 55291864)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 18:21:11,551 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-612a8f06-9df0-4b3d-9a3a-f52d2a536914
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  13.318006, -140.549   ],
       [  13.321916, -140.54144 ],
       [  13.325825, -140.53389 ],
       ...,
       [  17.720667,  -70.02423 ],
       [  17.717627,  -70.015785],
       [  17.714582,  -70.00734 ]], dtype=float32), 55291864)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

[14]:
<xarray.Dataset>
Dimensions:    (ID: 27276, time: 14610)
Coordinates:
  * ID         (ID) <U13 'GS91066022701' 'GHMQW00022701' ... 'ECCA008402568'
    elevation  (ID) float32 dask.array<chunksize=(1000,), meta=np.ndarray>
    latitude   (ID) float32 dask.array<chunksize=(1000,), meta=np.ndarray>
    longitude  (ID) float32 dask.array<chunksize=(1000,), meta=np.ndarray>
  * time       (time) datetime64[ns] 1979-01-01 1979-01-02 ... 2018-12-31
Data variables:
    prcp       (ID, time) float32 dask.array<chunksize=(1000, 1000), meta=np.ndarray>
    prcp_flag  (ID, time) float64 dask.array<chunksize=(1000, 1000), meta=np.ndarray>
    prcp_kge   (ID) float32 dask.array<chunksize=(1000,), meta=np.ndarray>
    sflag      (ID) <U3 dask.array<chunksize=(1000,), meta=np.ndarray>
    tmax       (ID, time) float32 dask.array<chunksize=(1000, 1000), meta=np.ndarray>
    tmax_flag  (ID, time) float64 dask.array<chunksize=(1000, 1000), meta=np.ndarray>
    tmax_kge   (ID) float32 dask.array<chunksize=(1000,), meta=np.ndarray>
    tmin       (ID, time) float32 dask.array<chunksize=(1000, 1000), meta=np.ndarray>
    tmin_flag  (ID, time) float64 dask.array<chunksize=(1000, 1000), meta=np.ndarray>
    tmin_kge   (ID) float32 dask.array<chunksize=(1000,), meta=np.ndarray>
[15]:
%%time
ds.prcp \
.sel(time=slice('1996-07-19','1996-07-20')) \
.sum('time') \
.to_dataframe() \
.replace({0:np.nan}) \
.dropna(how='any') \
.hvplot.points(x='longitude',
               y='latitude',
               color='prcp',
               geo=True,
               alpha=0.5,
               xlim=(-180,-30),
               ylim=(0,72),
               tiles='ESRI',
               cmap='gist_ncar',
               clim=(0,100),
               hover_cols=['ID','prcp'],
               width=700,
               height=400,
               title=f'48h precipitation during Saguenay flood event')
CPU times: user 1.68 s, sys: 140 ms, total: 1.82 s
Wall time: 24.1 s
[15]:

f) 20 Century reanalysis - single levels (extent : Atlantic Northeast)

Using a state-of-the-art data assimilation system and surface pressure observations, the NOAA-CIRES-DOE Twentieth Century Reanalysis (20CR) project has generated a four-dimensional global atmospheric dataset of weather spanning 1836 to 2015 to place current atmospheric circulation patterns into a historical perspective.

Property

Values

Temporal extent:

01/01/1836 – 12/31/2015

Spatial extent:

Atlantic Northeast [-96, -52, 40, 63]

Chunks

{‘time’: 32872, ‘longitude’: 6, ‘latitude’: 3}

Spatial resolution:

1 degrees

Spatial reference:

WGS84 (EPSG:4326)

Temporal resolution:

3 hours

Update frequency:

None

Data access

[16]:
ds=cat.atmosphere['20_century_reanalysis_single_levels'].to_dask()
ds
[16]:
<xarray.Dataset>
Dimensions:    (time: 525952, latitude: 24, longitude: 45)
Coordinates:
  * latitude   (latitude) float32 40.0 41.0 42.0 43.0 ... 60.0 61.0 62.0 63.0
  * longitude  (longitude) float32 -96.0 -95.0 -94.0 -93.0 ... -54.0 -53.0 -52.0
  * time       (time) datetime64[ns] 1836-01-01 ... 2015-12-31T21:00:00
Data variables:
    apcp       (time, latitude, longitude) float32 dask.array<chunksize=(32872, 3, 6), meta=np.ndarray>
    cape       (time, latitude, longitude) float32 dask.array<chunksize=(32872, 3, 6), meta=np.ndarray>
    crain      (time, latitude, longitude) float32 dask.array<chunksize=(32872, 3, 6), meta=np.ndarray>
    pr_wtr     (time, latitude, longitude) float32 dask.array<chunksize=(32872, 3, 6), meta=np.ndarray>
    prate      (time, latitude, longitude) float32 dask.array<chunksize=(32872, 3, 6), meta=np.ndarray>
    tcdc       (time, latitude, longitude) float32 dask.array<chunksize=(32872, 3, 6), meta=np.ndarray>
    tmax       (time, latitude, longitude) float32 dask.array<chunksize=(32872, 3, 6), meta=np.ndarray>
    tmin       (time, latitude, longitude) float32 dask.array<chunksize=(32872, 3, 6), meta=np.ndarray>
Attributes: (12/24)
    Conventions:               CF-1.2
    References:                https://www.psl.noaa.gov/data/gridded/data.20t...
    assimilation_algorithm:    Ensemble Kalman Filter with 4DIAU
    citation:                  Compo,G.P. <https://www.psl.noaa.gov/people/gi...
    citation1:                 Slivinski, L. C, G. P. Compo, J. S. Whitaker, ...
    comments:                  Data are from \nNOAA/CIRES/DOE 20th Century Re...
    ...                        ...
    product:                   reanalysis
    source:                    20CRv3si 2018, Ensemble Kalman Filter, ocean (...
    spatial_resolution:        1.0 degree
    standard_name_vocabulary:  NetCDF Climate and Forecast (CF) Metadata Conv...
    title:                     8x Daily NOAA/CIRES/DOE 20th Century Reanalysi...
    version:                   3si

Working with the data

Here we compute a simple line plot :

[17]:
%%time
ds.sel(latitude=45,
       longitude=-75) \
.prate \
.hvplot(grid=True)
CPU times: user 150 ms, sys: 16.5 ms, total: 167 ms
Wall time: 2.01 s
[17]:

g) 20 Century reanalysis - single levels (large area : for analysis in space)

Using a state-of-the-art data assimilation system and surface pressure observations, the NOAA-CIRES-DOE Twentieth Century Reanalysis (20CR) project has generated a four-dimensional global atmospheric dataset of weather spanning 1836 to 2015 to place current atmospheric circulation patterns into a historical perspective.

Property

Values

Temporal extent:

01/01/1836 – 12/31/2015

Spatial extent:

Atlantic Northeast [-96, -52, 40, 63]

Chunks

{‘time’: 100, ‘longitude’: 45, ‘latitude’: 24}

Spatial resolution:

1 degrees

Spatial reference:

WGS84 (EPSG:4326)

Temporal resolution:

3 hours

Update frequency:

None

[18]:
ds=cat.atmosphere['20_century_reanalysis_single_levels_large_area'].to_dask()
ds
[18]:
<xarray.Dataset>
Dimensions:    (time: 525952, latitude: 24, longitude: 45)
Coordinates:
  * latitude   (latitude) float32 40.0 41.0 42.0 43.0 ... 60.0 61.0 62.0 63.0
  * longitude  (longitude) float32 -96.0 -95.0 -94.0 -93.0 ... -54.0 -53.0 -52.0
  * time       (time) datetime64[ns] 1836-01-01 ... 2015-12-31T21:00:00
Data variables:
    apcp       (time, latitude, longitude) float32 dask.array<chunksize=(100, 24, 45), meta=np.ndarray>
    cape       (time, latitude, longitude) float32 dask.array<chunksize=(100, 24, 45), meta=np.ndarray>
    crain      (time, latitude, longitude) float32 dask.array<chunksize=(100, 24, 45), meta=np.ndarray>
    pr_wtr     (time, latitude, longitude) float32 dask.array<chunksize=(100, 24, 45), meta=np.ndarray>
    prate      (time, latitude, longitude) float32 dask.array<chunksize=(100, 24, 45), meta=np.ndarray>
    tcdc       (time, latitude, longitude) float32 dask.array<chunksize=(100, 24, 45), meta=np.ndarray>
    tmax       (time, latitude, longitude) float32 dask.array<chunksize=(100, 24, 45), meta=np.ndarray>
    tmin       (time, latitude, longitude) float32 dask.array<chunksize=(100, 24, 45), meta=np.ndarray>
Attributes: (12/24)
    Conventions:               CF-1.2
    References:                https://www.psl.noaa.gov/data/gridded/data.20t...
    assimilation_algorithm:    Ensemble Kalman Filter with 4DIAU
    citation:                  Compo,G.P. <https://www.psl.noaa.gov/people/gi...
    citation1:                 Slivinski, L. C, G. P. Compo, J. S. Whitaker, ...
    comments:                  Data are from \nNOAA/CIRES/DOE 20th Century Re...
    ...                        ...
    product:                   reanalysis
    source:                    20CRv3si 2018, Ensemble Kalman Filter, ocean (...
    spatial_resolution:        1.0 degree
    standard_name_vocabulary:  NetCDF Climate and Forecast (CF) Metadata Conv...
    title:                     8x Daily NOAA/CIRES/DOE 20th Century Reanalysi...
    version:                   3si

Working with the data

[19]:
%%time
ds.sel(time='2000-01-01T00:00') \
.tmax \
.hvplot(grid=True,
        cmap='cwr',
        geo=True,
        tiles='CartoLight',
        alpha=0.75,
        width=700,
        height=400,)
CPU times: user 108 ms, sys: 0 ns, total: 108 ms
Wall time: 106 ms
[19]:

Other datasets :

The previous examples can be applied to the following datasets as well. We will let the end user experiment with them!

[20]:
ds=cat.atmosphere['20_century_reanalysis_pressure_levels'].to_dask()
ds
[20]:
<xarray.Dataset>
Dimensions:    (time: 525952, level: 17, latitude: 24, longitude: 45)
Coordinates:
  * latitude   (latitude) float32 40.0 41.0 42.0 43.0 ... 60.0 61.0 62.0 63.0
  * level      (level) float64 1.0 5.0 10.0 20.0 ... 700.0 800.0 900.0 1e+03
  * longitude  (longitude) float32 -96.0 -95.0 -94.0 -93.0 ... -54.0 -53.0 -52.0
  * time       (time) datetime64[ns] 1836-01-01 ... 2015-12-31T21:00:00
Data variables:
    air        (time, level, latitude, longitude) float32 dask.array<chunksize=(29200, 1, 24, 25), meta=np.ndarray>
    hgt        (time, level, latitude, longitude) float32 dask.array<chunksize=(29200, 1, 24, 25), meta=np.ndarray>
    omega      (time, level, latitude, longitude) float32 dask.array<chunksize=(29200, 1, 24, 25), meta=np.ndarray>
    rhum       (time, level, latitude, longitude) float32 dask.array<chunksize=(29200, 1, 24, 25), meta=np.ndarray>
Attributes: (12/25)
    Conventions:                     CF-1.2
    DODS_EXTRA.Unlimited_Dimension:  time
    References:                      https://www.esrl.noaa.gov/psd/data/gridd...
    assimilation_algorithm:          Ensemble Kalman Filter with 4DIAU
    citation:                        Compo,G.P. <https://www.esrl.noaa.gov/ps...
    citation1:                       Slivinski, L. C, G. P. Compo, J. S. Whit...
    ...                              ...
    product:                         reanalysis
    source:                          20CRv3si 2018, Ensemble Kalman Filter, o...
    spatial_resolution:              1.0 degree
    standard_name_vocabulary:        NetCDF Climate and Forecast (CF) Metadat...
    title:                           8x Daily NOAA/CIRES/DOE 20th Century Rea...
    version:                         3si
[21]:
ds=cat.atmosphere['20_century_reanalysis_pressure_levels_large_area'].to_dask()
ds
[21]:
<xarray.Dataset>
Dimensions:    (time: 525952, level: 17, latitude: 24, longitude: 45)
Coordinates:
  * latitude   (latitude) float32 40.0 41.0 42.0 43.0 ... 60.0 61.0 62.0 63.0
  * level      (level) float64 1.0 5.0 10.0 20.0 ... 700.0 800.0 900.0 1e+03
  * longitude  (longitude) float32 -96.0 -95.0 -94.0 -93.0 ... -54.0 -53.0 -52.0
  * time       (time) datetime64[ns] 1836-01-01 ... 2015-12-31T21:00:00
Data variables:
    air        (time, level, latitude, longitude) float32 dask.array<chunksize=(100, 1, 24, 45), meta=np.ndarray>
    hgt        (time, level, latitude, longitude) float32 dask.array<chunksize=(100, 1, 24, 45), meta=np.ndarray>
    omega      (time, level, latitude, longitude) float32 dask.array<chunksize=(100, 1, 24, 45), meta=np.ndarray>
    rhum       (time, level, latitude, longitude) float32 dask.array<chunksize=(100, 1, 24, 45), meta=np.ndarray>
Attributes: (12/25)
    Conventions:                     CF-1.2
    DODS_EXTRA.Unlimited_Dimension:  time
    References:                      https://www.esrl.noaa.gov/psd/data/gridd...
    assimilation_algorithm:          Ensemble Kalman Filter with 4DIAU
    citation:                        Compo,G.P. <https://www.esrl.noaa.gov/ps...
    citation1:                       Slivinski, L. C, G. P. Compo, J. S. Whit...
    ...                              ...
    product:                         reanalysis
    source:                          20CRv3si 2018, Ensemble Kalman Filter, o...
    spatial_resolution:              1.0 degree
    standard_name_vocabulary:        NetCDF Climate and Forecast (CF) Metadat...
    title:                           8x Daily NOAA/CIRES/DOE 20th Century Rea...
    version:                         3si
[22]:
ds=cat.atmosphere['terraclimate'].to_dask()
ds
[22]:
<xarray.Dataset>
Dimensions:                 (time: 744, lat: 4320, lon: 8640, crs: 1)
Coordinates:
  * crs                     (crs) int16 3
  * lat                     (lat) float64 89.98 89.94 89.9 ... -89.94 -89.98
  * lon                     (lon) float64 -180.0 -179.9 -179.9 ... 179.9 180.0
  * time                    (time) datetime64[ns] 1958-01-01 ... 2019-12-01
Data variables: (12/18)
    aet                     (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    def                     (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    pdsi                    (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    pet                     (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    ppt                     (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    ppt_station_influence   (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    ...                      ...
    tmin                    (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    tmin_station_influence  (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    vap                     (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    vap_station_influence   (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    vpd                     (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    ws                      (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
[23]:
# A new dataset with all rcps is being created and will replace this one
ds=cat.climate_change['rcp45_day_NAM_22i_raw_zarr'].to_dask()
ds

[23]:
<xarray.Dataset>
Dimensions:    (lat: 258, lon: 600, member_id: 3, time: 34698, bnds: 2)
Coordinates:
  * lat        (lat) float64 12.12 12.38 12.62 12.88 ... 75.62 75.88 76.12 76.38
  * lon        (lon) float64 -171.9 -171.6 -171.4 ... -22.62 -22.38 -22.12
  * member_id  (member_id) <U20 'CanESM2.CRCM5-OUR' ... 'GFDL-ESM2M.CRCM5-OUR'
  * time       (time) datetime64[ns] 2006-01-01T12:00:00 ... 2100-12-31T12:00:00
    time_bnds  (time, bnds) datetime64[ns] dask.array<chunksize=(17349, 2), meta=np.ndarray>
Dimensions without coordinates: bnds
Data variables: (12/15)
    hurs       (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    huss       (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    pr         (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    prec       (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    ps         (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    rsds       (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    ...         ...
    tasmin     (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    temp       (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    tmax       (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    tmin       (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    uas        (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    vas        (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
Attributes: (12/23)
    CORDEX_domain:                  NAM-22
    contact:                        {"GFDL-ESM2M.CRCM5-OUR": "biner.sebastien...
    creation_date:                  {"GFDL-ESM2M.CRCM5-OUR": "2019-02-12 15:2...
    driving_experiment:             {"GFDL-ESM2M.CRCM5-OUR": "GFDL-ESM2M,rcp4...
    driving_experiment_name:        rcp45
    driving_model_ensemble_member:  {"GFDL-ESM2M.CRCM5-OUR": "r1i1p1", "CanES...
    ...                             ...
    references:                     {"GFDL-ESM2M.CRCM5-OUR": "http://www.oura...
    title:                          {"GFDL-ESM2M.CRCM5-OUR": "NA-CORDEX Raw N...
    tracking_id:                    {"GFDL-ESM2M.CRCM5-OUR": "5139ec82-c55f-4...
    version:                        {"GFDL-ESM2M.CRCM5-OUR": "1.1", "CanESM2....
    zarr-dataset-reference:         For dataset documentation, see DOI https:...
    zarr-version:                   1.0
[24]:
# Sample from melcc hydrometric data. Needs to be completed and add data from other providers as well.
ds=cat.hydrology['melcc'].to_dask()
ds

[24]:
<xarray.Dataset>
Dimensions:                 (basin_id: 470, time: 41007)
Coordinates: (12/16)
    _last_update_timestamp  (basin_id) datetime64[ns] dask.array<chunksize=(470,), meta=np.ndarray>
    aggregation             (basin_id) <U1 dask.array<chunksize=(470,), meta=np.ndarray>
  * basin_id                (basin_id) <U6 '010101' '010801' ... '135201'
    data_type               (basin_id) <U1 dask.array<chunksize=(470,), meta=np.ndarray>
    drainage_area           (basin_id) float32 dask.array<chunksize=(470,), meta=np.ndarray>
    end_date                (basin_id) datetime64[ns] dask.array<chunksize=(470,), meta=np.ndarray>
    ...                      ...
    regulated               (basin_id) <U1 dask.array<chunksize=(470,), meta=np.ndarray>
    source                  (basin_id) <U1 dask.array<chunksize=(470,), meta=np.ndarray>
    start_date              (basin_id) datetime64[ns] dask.array<chunksize=(470,), meta=np.ndarray>
  * time                    (time) datetime64[ns] 1910-01-01 ... 2022-04-09
    timestep                (basin_id) <U1 dask.array<chunksize=(470,), meta=np.ndarray>
    units                   (basin_id) <U1 dask.array<chunksize=(470,), meta=np.ndarray>
Data variables:
    flag                    (time, basin_id) <U1 dask.array<chunksize=(2563, 59), meta=np.ndarray>
    value                   (time, basin_id) float32 dask.array<chunksize=(5126, 59), meta=np.ndarray>